THE EXISTENCE OF A STATIONARY e - OPTIMAL POLICY FOR A FINITE MARKOV CHAIN
نویسنده
چکیده
In this paper we investigate the problem of optimal control of a Markov chain with a finite number of states when the control sets are compact in the metric space. The goal of the control is to maximize the average reward per unit step. For the case of finite control and state sets the existence of a stationary optimal policy was proved in [1] and [2]. In [3]-[5] it was proved that for a controlled Markov process with finite state space, compact control sets and continuous reward and transition functions there may not exist an optimal policy. In this paper it is proved that if the state space is finite, the control sets are compact, the transition functions are continuous and the reward functions are upper semicontinuous, then for any positive e there exists a stationary e-optimal policy. By the average reward one can understand here the lower as well as the upper limit of the average reward per unit step. For the case of the lower limit the existence of the stationary e-optimal policy was proved in [4]. Examples in [3] and [4] show that if the above restrictions are not satisfied on the control sets, the transition functions and the reward functions, there may not exist a stationary e-optimal policy for some positive e. If the state space is not finite then, as shows the example in [6], there may not be a stationary e-optimal policy even in the case of finite control sets. Observe that if the number of states is two, then, according to [7], under the assumptions made in this paper there exists a stationary optimal policy. In [7]-[9] were studied sufficient conditions for the existence of stationary optimal policies imposing certain additional (in relation to the requirements of the present paper) restrictions on the control sets. In [8] it was proved that for compact convex control sets coinciding with the sets of transition probabilities and concave continuous reward functions there exists a stationary optimal policy if any stationary policy defines an ergodic Markov chain without transient states. In [9] it was shown that under the condition derived in [8] it is sufficient to require that not any but rather that at least one stationary policy define an ergodic Markov chain without transient states. In [7] two sufficient conditions were given. These conditions consist in the fact that in addition to the assumptions of the …
منابع مشابه
Weak conditions for the existence of optimal stationary policies in average Markov decision chains with unbounded costs
Average cost Markov decision chains with discrete time parameter are considered. The cost function is unbounded and satisfies an additional condition which frequently holds in applications. Also, we assume that there exists a single stationary policy for which the corresponding Markov chain is irreducible and ergodic with finite average cost. Within this framework, the existence of an average c...
متن کاملRelative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain
In this paper we study the relative entropy rate between a homogeneous Markov chain and a hidden Markov chain defined by observing the output of a discrete stochastic channel whose input is the finite state space homogeneous stationary Markov chain. For this purpose, we obtain the relative entropy between two finite subsequences of above mentioned chains with the help of the definition of...
متن کاملOptimal Stationary Policies in Risk-sensitive Dynamic Programs with Finite State Space and Nonnegative Rewards
This work concerns controlled Markov chains with finite state space and nonnegative rewards; it is assumed that the controller has a constant risk-sensitivity, and that the performance of a control policy is measured by a risk-sensitive expected total-reward criterion. The existence of optimal stationary policies is studied within this context, and the main result establishes the optimality of ...
متن کاملAn Optimal Tax Relief Policy with Aligning Markov Chain and Dynamic Programming Approach
Abstract In this paper, Markov chain and dynamic programming were used to represent a suitable pattern for tax relief and tax evasion decrease based on tax earnings in Iran from 2005 to 2009. Results, by applying this model, showed that tax evasion were 6714 billion Rials**. With 4% relief to tax payers and by calculating present value of the received tax, it was reduced to 3108 billion Rials. ...
متن کاملInterval Methods for Uncertain Markov Decision Processes
In this paper, the average cases of Markov decision processes with uncertainty is considered. That is, a controlled Markov set-chain model with a finite state and action space is developed by an interval arithmetic analysis, and we will find a Pareto optimal policy which maximizes the average expected rewards over all stationary policies under a new partial order. The Pareto optimal policies is...
متن کامل